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Abstract: For the univariate current status and, more generally, the interval censoring model, dis- 
tribution theory has been developed for the maximum likelihood estimator (MLE) and smoothed 
maximum likelihood estimator (SMLE) of the unknown distribution function, see, e.g., [12], [7], [4], 
[5], [6], [10], [11] and [8]. For the bivariate current status and interval censoring models distribution 
theory of this type is still absent and even the rate at which we can expect reasonable estimators to 
converge is unknown. 

We define a purely discrete plug-in estimator of the distribution function which locally converges at 
rate n 1 / 3 , and derive its (normal) limit distribution. Unlike the MLE or SMLE, this estimator is not 
a proper distribution function. Since the estimator is purely discrete, it demonstrates that the n 1 / 3 
convergence rate is in principle possible for the MLE, but whether this actually holds for the MLE 
is still an open problem. If the cube root n rate holds for the MLE, this would mean that the local 
1-dimensional rate of the MLE continues to hold in dimension 2, a (perhaps) somewhat surprising 
result. The simulation results do not seem to be in contradiction with this assumption, however. 

We compare the behavior of the plug-in estimator with the behavior of the MLE on a sieve and the 
SMLE in a simulation study. This indicates that the plug-in estimator and the SMLE have a smaller 
variance but a larger bias than the sieved MLE. The SMLE is conjectured to have a n 1 / 3 -rate of 
convergence if we use bandwidths of order n~ - 1 -/ 6 . We derive its (normal) limit distribution, using this 
assumption. Finally, we demonstrate the behavior of the MLE and SMLE for the bivariate interval 
censored data of [1], which have been discussed by many authors, see e.g., [18], [3], [2] and [15]. 

AMS 2000 subject classifications: Primary 62G05, 62N01; secondary 62G20. 

Keywords and phrases: bivariate current status, bivariate interval censoring, maximum likelihood 
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1. Introduction 

We consider the bivariate current status model, also called the bivariate interval censoring, case 1, model. 
This means that our observations consist of a quadruple (T, U, Ai , A 2 ), where 



and (X,Y) is independent of the observation (T,U). We want to estimate the distribution function F of 
the 'hidden' random vector (X,Y). 

A maximum likelihood estimator F n of Fq, the distribution function of (X, Y), maximizes the expression 



over all bivariate distribution functions F. Another formulation is that F n maximizees 



Ai 



1{x<t}, A 2 — l{y<(7}, 




n 



{ Aii A i2 log F(T h Ui) + A<i (1 - A i2 ) log {F(T h oo) - F{T U U)} 



+ (1 - Ad) A l2 log {F(oo, Ui) - F(Ti, U)} 



+ (1 - An) (1 - A i2 ) {1 - F(oo, Ui) ~ F(Ti, oo) + F(T l: U t )}} 




5 1 S 2 \ogF(u,v)dP n + / S^l -S 2 ) log {Fi(u) - F(u,v)} dF, 




n 



+ 



+ 




(i 



(i 




S 1 )(l - 5 2 ) log{l - *i(u) - F 2 (v) + F(u, v)} dP, 



n 



n 



1 



P. Groeneboom/ Bivariate current status 



2 



over F, where F% and F2 are the first and second marginal dfs of F, respectively, and P n is the empirical 
measure of the observations (Tj, f/j, A<i, A^), i = 1, . . . ,n. 
One looks for a solution of the form 

m m 

F„ = y^qjl[ T . [0O ), y^a 3 < 1, aij > 0, 1 < j < m = m„ , 

3 = 1 3=1 

where we denote by [rj,oo) an infinite rectangle [tji,oo) x [tj 2 ,oo), where r, = (^1, ^2)- Then the (Fenchel 
or Kuhn- Tucker) duality conditions for the solution are: 

Ma dw n+ r , dw n (1.2) 



i x3) F(u,v) y[ tl ,oo)x[o,t 2 ) - F{u,v) 



[o,ti)x[*2,oo) -FaM -.F(u,u) n J[o, tl )x[o,t 2 ) l-Fi(u)-i ;i 2(f) + ^(w,'y) 
< 1, 

for all t = (t 1 ,t 2 ) € M 2 , where P„ is the empirical df of the observations (5n,8i2,u i ,v i ). We must have 
equality in (1.2), if t = Tj, j = 1, . . . , m n , is a point of mass of the solution (i.e., the constraints are active!), 
where the rectangles [Tj, 00) are the generators of the solution. 

An R package, called 'MLEcens', written by Marloes Maathuis, is available for computing the MLE. 
The algorithm determines the maximal intersection rectangles where the MLE has mass via a preliminary 
reduction algorithm, and next computes the mass of the MLE in these rectangles, using the support reduction 
algorithm of [13]. The reduction algorithm is based on methods from graph theory and described in [15]. 
The R package uses as an example a data set, studied in [1], which is actually not of the current status type 
but has interval censoring, case 2, data. We shall discuss these data in section 5. The MLE for this data set 
is also discussed in section 7.3.3 of [18], who also refers to [3] and [2] for discussions of the computation of 
the MLE for this data set. The computation of the rectangles where the MLE puts mass is also treated in 
[17], where also minimax lower bounds for the estimation rate of the MLE and consistency of the MLE are 
derived. 

There is an extensive discussion on where to put the mass, once one has determined rectangles which can 
have positive mass, see, e.g. [18], section 7.3, [3], [2] and [15]. The algorithm for computing these rectangles, 
proposed in [15], seems at present to be the fastest. 

It is in our view somewhat doubtful whether all the energy spent on computing these maximal intersection 
rectangles and the ensuing question of whether one should place the mass of the MLE at the left lower corner 
or the right upper corner of the rectangles is really worth the effort. One could also specify in advance a set 
of points where one allows mass to be placed. In this way one obtains an MLE on a sieve, where the sieve 
consists of distributions having discrete mass at these points. The bottleneck in the computation of the MLE 
for the bivariate interval censoring problem is not the determination of the maximal intersection rectangles, 
but the computation of the mass the MLE puts on these rectangles, since there usually are very many! 

The latter phenomenon shows up in particular in simulations. As an example, simulating data from the 
distribution with density f{x,y) = x + y on the unit square, with a uniform observation distribution, we 
got for sample size n = 5000 about 5 • 10 5 possible rectangles where the masses could be placed, which is 
(at present) an almost prohibitive number if one wants to do simulations of the limit behavior of the MLE 
on an ordinary table computer or laptop. Ultimately, the discussion on these matters should in our view be 
determined by insights into the distribution theory of the MLE or the MLE on the chosen sieve. 

In section 2 we show that a purely discrete plug-in estimator locally attains the n 1 / 3 rate. We also 
determine its asymptotic (normal) distribution. In section 3 we study the local limit behavior of the smoothed 
maximum likelihood estimator and derive its asymptotic distribution under the assumption that it can 
asymptotically can be represented by an integral in the observation space, proceeding along similar lines as 
in the one-dimensional case (see, e.g., [5], [10] and [9]). Section 4 presents a simulation study, comparing 
the behavior of the MLE, the SMLE and the plug-in estimator. The results seem to be in accordance with 
Theorems 2.1 and 3.1 in sections 2 and 3, respectively, and also seem to be in accordance with the conjecture 
that the MLE locally attains rate n 1 / 3 . Section 5 extends the MLE and SMLE to a more general setting of 
interval censoring and applies the methods on a data set in [1], which has been discussed by many authors, 
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see e.g., [18], [3], [2] and [15]. The paper ends with some concluding remarks on faster rates for the SMLE, 
which can be attained if one uses higher order kernels. 

2. A purely discrete n 1 / 3 -rate estimator for the bivariate current status model 

Basically, the MLE for the 1-dimensional current status model is the monotone derivative of the cusum 
diagram 

(G n (t),V n (t)),t£l, V n (t)= [ 6d¥ n (u,6), 

Ju<t 

where I is the observation interval and Q n the empirical distribution function of the observations T±, . . . ,T„. 
So it can be considered to be a monotone version of the 'derivative' dV n (t) / d& n (t) . Note that if we replace 
V n and G n by their deterministic equivalents, the derivative becomes 

W) _ p m 

- mi 

so is indeed the object we want to estimate. 

For the simplest bivariate current status model, which is sometimes called the 'in-out' model, we only 
have the information whether the hidden variable is below and to the left of the observation point (Tj, Ui) 
or not. In this case we could also define 

V n {t,u)= { 5dP n (v,w,6), 

J v<t, w<u 

where <5=1 represents the situation that the hidden variable is below and to the left of (v,w). If the 
empirical observation distribution is again denoted by <G ra , we this time want to estimate the 'derivative' 
dV n (t, u)/dG n (t,u), since, replacing V n and G„ by their deterministic equivalents, the derivative becomes 

F (t,u)g(t,u) 

-— = F {t,u). 

g(t,u) 

So we want to find a version of the derivative dV n (t, u)/dG n (t, u), under the (shape) restriction that it is a 
bivariate distribution function. 

However, a natural cusum diagram for this situation does not seem to exist. But we can define a 2- 
dimcnsional 'Fenchel process', incorporating the duality conditions for a solution of the optimization problem. 
Analogously to the 1-dimensional current status model, the Fenchel duality conditions are: 

{5-F(v,w)} d¥ n {v,w,S) < 0, (2.1) 

v>t, w>u 

with equality if (i, u) is a point of mass of the solution. So we have to deal with a process 

/ F(v,w)dG n (v,w) (2.2) 

which has to lie above the process 

(t,u)^ [ Sd¥ n {v 7 w,S), 

J V > t , 10 > u 

with points of touch at points of mass of F. Denoting temporarily the process (2.2) by Q ni we get that 
the MLE can (formally) be denoted by dQ n (t,u) /dG n (t,u). Note, however, that the function Q n is not 
necessarily close to a convex or concave function, so here the analogy with 1-dimensional current status 
breaks down. But it must have the property that its 'derivative' w.r.t. dG n must be a distribution function, 
which is analogous to the fact that the derivative of the convex minorant of the cusum diagram must be a 
distribution function in the 1-dimensional case. 
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For the real current status model the situation is more complicated, since we then have to deal with 4 
regions instead of 2. From (1.2) we get: 

Si8 3 dp , f 6,(1-5,) 



t.oo) F n (u,v) J[t 1 ,oo)x[o,t 2 ) F n i(u) -F n (u,v) 

+ [ , dPn+ [ a d-W-^ dPn 

J[o,t!)x[t 2 ,oo) F„ 2 (w) - F n (u,«) y[o,ti)x[o,t 2 ) 1 - F nl (u) - F n2 {v) + F n (u,v) 

< 1, 

where t = (tj, £2), with equality if (ti, t 2 ) is a point of mass of i 7 ^. 

It has been conjectured that the MLE in the bivariate current status model converges locally at rate 
rt 1 / 3 , just as in the 1-dimensional current status model (with smooth underlying distribution functions). 
[17] proves a minimax lower bound of order n -1 / 3 . It would be somewhat surprising if the 1-dimensional 
rate would be preserved in dimension 2, since in general one gets lower rates for density estimators if the 
dimension gets up, and the estimation of the distribution function in the current status model is similar to 
density estimation problems, as argued above. 

To show that it is in principle possible that the MLE attains the local rate n 1 / 3 , we construct a purely 
discrete estimator (different from the MLE), converging locally at rate n 1 / 3 . We restrict ourselves for sim- 
plicity to distributions with support [0, l] 2 in the remainder of this section, but the generalization to more 
general rectangles is obvious. We have the following result, which is proved in the appendix. 

Theorem 2.1. Consider an interior point (t,u), and define the square A n , with midpoint (t,u), by: 

A n = [t - n~ 1/6 , * + n~ 1/6 ] x [u - n~ 1/6 , u + n~ 1/6 ]. 

Moreover, suppose that the observation distribution G is twice continuously differentiable at (t, u) with a 
strictly positive density g(t,u) at (t,u), and that F is twice continuously differentiable at (t,u). Then the 
estimator 

~ def J^S^ d¥ n (v,w, 6^62) 

Fn(t,u) = — — r , (2.3) 

} An d<G n (v,w) 

where Q n is the empirical distribution function of the observations (Tj, t/j) and ¥ n is the empirical distribution 
function of the observations 

(Ti,Ui, Aji, A i2 ) , i = l,...,n, 

satisfies: 

n 1 / 3 |i^„(£, u) — Fo(t, u)| N (/?, cr 2 ) , 
where N(j3, a 2 ) is a normal distribution with first moment 

d 1 F (t,u)d 1 g(t 7 u) + d 2 F (t,u)d 2 g(t,u) 



P = \{dlF {t,u) + dlF {t,u)} 



3g(t,u) 



ana variance 



a 2 = F Q (t,u){l-F (t,u)} 
Ag{t,u) 

We now allow as possible points of mass the points (i«i, u„i), . . . , (t n>nin , u n ^ mn ) running through a rect- 
angular grid, where the distances between the points on the x- and y-axis are of order n -1 / 3 , and define the 
estimate F n at each point (t n i,u n i) as in Theorem 2.1. Next we define the masses p n i at the points (t n i,u n i) 
by the equations 

^ ^ Pnj — F n (t n i 7 u n i ) , i — 1 , . . . , m n . 

Note that the estimate F n we obtain in this way is not necessarily a distribution function and that the masses 
p n j can have negative values. 
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(a) Plug-in estimator (b) MLE on the points of mass of the plug-in estimate 

Fig 1: The plug-in estimator F n and the MLE, for a sample of size n — 5000 of bivariate current status data, 
where the hidden variables have a distribution with density fo(x, y) = x + y, and the observation distribution 
is uniform on [0, l] 2 . 



A picture of F n , together with the MLE, computed on the sieve of points of mass of the plug-in estimator, is 
shown in Figure 1. The sieved MLE is a real (discrete) distribution function, so all its masses are nonnegative. 

Assuming the support of the distribution with function Fo to be [0, l] 2 , we treat the points near the upper 
and right boundary in a special way in computing F n . If, for example t n i > 1 — h n , where h n ~ nT 1 ^ and 
(i n j, u ni ) is a point of the grid, we put the Sji corresponding to the contribution of observations (Tj, Uj) such 
that Tj > 2 — t n i — h n , equal to 1. We treat the second coordinate in the same way. This reduces the bias we 
otherwise would get, with an underestimation of the distribution function near the right and upper boundary. 
The idea to treat the points near the boundary in this way is inspired by, but different from, the reflection 
method proposed by [16]. For points near the left and lower boundary, we follow a similar procedure. If, 
for example t < h n , where h n ~ n -1 / 6 and (t,u) is a point of the grid, we put the 8ji corresponding to 
the contribution of observations (Tj,Uj) such that Tj < h n — t, equal to 0. The bias near the boundary is 
actually of order 0(h n ) in this way and we do not attain the 0(/i 2 ) for interior points, though. 

One can also compute the isotonic projection of the plug-in estimator on the set of distribution functions 
by minimizing 

n 2 

J2{ F i T i,Ui) - F n (Ti,Ui)} Wi 
i=l 

over all distribution functions F, where the Wi are positive weights. In this way one gets a distribution 
function very close to F ni having generally more points of mass than the MLE. We will not pursue this 
direction in the present paper, though. 

3. The smoothed maximum likelihood estimator 

Throughout this section we will assume for simplicity that the support of the distribution of the 'unob- 
servables' is [0, l] 2 and that we want to estimate the corresponding distribution function Fo on [0, l] 2 . The 
generalization to more general rectangles is obvious. 

Let K be a symmetric non-negative kernel, for example the triweight kernel 



K(x) = §(l-x 2 ) 3 l [ _ hl] (x), 
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and Kh{x) = h^ 1 K(x/h), for h > 0. Moreover, let the integrated kernel K be defined by 

K{x) = f K(y) dy. 



We follow the approach for the 1-dimensional case, discussed in the references [4] to [10]. 

At an interior point (t, u), not too close to the boundary, the smoothed maximum likelihood estimator 
(SMLE) is just defined by 

Fil ML \t,u) = J ' K h {t-v)K h {u-w)dF n {v,w), K h (x) = K(x/h), (3.1) 

To prevent the negative bias at the right and upper boundary of the support, we also perform a correction 
by extending the definition of IK near the upper and right boundary by: 

/>oo 

K\x)= / K(y)dy (3.2) 

J X 

and denning 

Fn h ML \t,u)= ( {K h (t-v)+lK b h (2-t-v)}{K h (u-w) + M b h (2-u-w)} dF n (v,w), (3-3) 



and K^x) = h~ 1 lK b (x/h). This definition of the (integrated) boundary kernel is based on the reflection 
boundary correction method for density estimates, proposed in [16]. Note that the definitions (3.1) and (3.3) 
coincide if max(i, u) < 1 — h. 

We next define the score function in the hidden space: 

HtM x >y) = i K h(t -x)+ K b h {2 -t-y)} {K h {u -x) + K b (2 -u-y)}, 

Scores in the observation space are given by 

6 Fo (v,w,5 u 8 2 )=E{a(X,Y) | (T, U, A x , A 2 ) = (v,w, 5 U 5 2 )} , (3.4) 

where a is a score in the hidden space. We have, for example 

\ s ^ a (x, y) dFn(x, y) 
E {a(X, Y) (T, U, A 1; A 2 ) = (v, w, 1,1) = J ^»^ ) —L 

F (v,w) 

With this notation, we want to solve the equation 
E{0 Fo (T,U, A U A 2 ) | (X,Y) = (x,y)} 

9f (v, 1, 1) g{V) w) dv dw + / 6f (v, w> 1, 0) g(v, w) dv dw 

v>x, w>y J v>x, w<y 



0F o (v, w, 0, 1) g(v : w) dv dw + / 9p (v, w : 0, 0) g(v, w) dv dw 

v<x, w>y J v<x 7 w<y 

= K(t,u)(%,y)- (3-5) 

Defining 

(t>F (x,y)= a(v,w)dF {v,w), (3.6) 

J v<x, w<y 

where a is a score function in the hidden space, and differentiating (3.5) w.r.t. x and y, we now obtain the 
equation: 

(j>F {x,y) _ (j> Fa {x,l) - c/> Fo (x,y) _ (j> Fo (l,y) - Fo (x,y) _ c/) Fo {x,l) +<p Fo (l,y) - c/> Fo (x,y) 
F (x,y) F (x,l)-F (x,y) " F (l,y) - F (x,y) 1 - F (x, 1) - F (l,y) + F {x,y) 
, y y 1 d 2 H(t,u)(x,y) = {K h (t -x) + K h (2 -t-x)} {K h {u ~ y) + K h {2 -u-y)} 
dxdy g{x,y) 
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This equation has the solution 

<f>F (x,y) 

{K h {t -x) + K h (2 -t-x)} {K h {u -y) + K h {2 -u-y)} 



g(x,y) 

(3.7) 



111 1 



F (x,y) F (x,l) - F {x,y) F (l,y) - F (x,y) 1 - F (x, 1) - F (l, y) + F {x,y) 
Note that the solution satisfies: 

<M,(1,2/) = 4>f {x,1) = 0, x,ye [0,1]. 
This suggests that the asymptotic behavior of the SMLE is given by: 



J 9 Fo (x,y,S u S 2 )d(¥ n -P)(x,y,S 1 ,S 2 ), (3.8) 



where 



9 Fo (v,w,6 1 ,S 2 ) 

_ <5i^20F o (a;,y) - 5 2 )(j>F (x,y) (1 - 5i)5 2 <j>F {x,y) (1 - £i)(l - 5 2 )<p Fo (x, y) 



F (x,y) F (x,l) - F (x,y) F (l,y) - F {x,y) 1 - F (l,y) - F (x, 1) + F {x,y) ' 
leading at interior points (t, u) to an asymptotic variance, given by: 

1 f f 1 1 1 1 



n J(x,y)e[o,i¥ \ F o(x,y) F (x, 1) -F (x,y) F (l, y) - F (x, y) 1 - F (l, y) - F (x, 1) + F (x, y) 

{K h (t -x) + K h {2 - t - x)} 2 {K h (u -y)+ K h (2 -u- y)} 2 

g{x,y) 

, 2 , -i r i_ 1 i 1 

> \F (t,u) + F (t,l)-F (t,u) + F Q (l,t) - F (t,u) + 1 - F (l, u) - F (t, 1) + F (t, u) } 

■g^u^y K{vfdv} . 

For points (t, u) for which one or both coordinates approach the upper boundary the expression on the 
last line has to be replaced by 

■g^u)- 1 f 1 { K {v)+k( 2 -^+v\\ dv f Ik(w) + k(^^+w)} dw. 

Note that 



v=(t-l)/h I \ ^ /J Jv=(u-\)/h 



2-2t \ 2-2t 
K — — +v = 0, v > 1 - 



h ' J ' " h ' 

and that 

I' \ K (v)+k(^-^+v) \ dv= C K(v)dv+ I' t)/H K / 2 -' 21 

Jv=(t-l)/h I V " J J Jv={t-l)/h Jv=(t-l)/h 

= [ K(v)dv+ f K(v)dv = 1. 

Jv=(t-l)/h Jv=(l-t)/h 



+ v ) dv 



lv=(t-V)/h Jv=(l-t)/h 

using the fact that if is a symmetric kernel with support [— 1, 1]. Also note that if t < 1 — h we get: 
jf 1 I^K(v)+K (^—^+v^ dv = J 1 K{vfdv. 
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Assume for the moment that max(i, u) < 1 — h. Then the bias is given by: 
Kh{t — v)IKh{u — w)fo(v, w) dv dw — Fo(t, u) 

Kh(t — v) / fo[x, w) dxdv > JKh(u — w) dw — Fo(t, u) 



Kh(t — v)Kh{u — w)Fq(v, w) dv dw — Fo(t, u) 
= J K(v)K(w){F (t- hv,u- hw) - F (t,u)} dvdw 

= \{dlF a {t,u)+dlF {t,u)}h 2 ^J x 2 K{x)dx^ + o(h 2 ). 
With the boundary correction included, the expression for the bias becomes: 

\ {d 2 F (t,u) + d%F (t,u)} h 2 J x 2 S^K{x) + K + } da 

y 2 (K(y) + K + | dy + o (h 2 ) 



'(u-l)//i 

The SMLE is compared with the MLE in Figure 2. 





(a) MLE (b) SMLE 

Fig 2: The MLE and SMLE for for a sample of size n — 1000 of bivariate current status data, where the 
hidden variables have a distribution with density fo(x, y) = x + y, and the observation distribution is uniform 
on [0, l] 2 . 

Using the assumption that F n SML \t,u) — Fo(t,u) has the asymptotic representation (3.8), we get the 
following result. 

Theorem 3.1. Under the conditions of Theorem 2.1 we have for each point (t,u) 6 (0, l) 2 satisfying these 
conditions, if h n x n" 1 / 6 , 

« 1/3 {F { nZ L) (*, «) - Mi, u)}^N {fi, a 2 ) , 
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where N(j3, a 2 ) is a normal distribution with first moment 

/3 = \c{dlF (t,u) +d 2 F Q (t,u)} I J x 2 K(x)dx^ , c= lim n 1/3 /i 2 , 

and variance 



o 2 = cr 1 



1 1 



F (t,u) F (t,l)-F (t,u) F (l,t) - F (t,u) l-F (l,u)-F (t,l) + F (t,u) 

■g^uy 1 ( / K(v) 2 dv 



2 



Remark 3.1. Note that choosing h n x n- 1 ^ is the asymptotically optimal choice (modulo constants) since 
the variance is of order l/(nft, 2 ) and the bias of order ft, 2 , unless the bias is of order o(/i 2 ) (as happens when 
Fq is the uniform distribution function on [0, l] 2 ). 



4. A simulation study 

In order to compare the behavior of the three estimators, and in particular also to see whether the n^-rate 
is plausible for the MLE, we took 1000 samples from the distribution with distribution function 

Fo(%,y) = \xy{x + y), x,y e [0, l] 2 , 

and generated bivariate current status data from this with respect to the uniform distribution on [0, l] 2 . The 
samples size taken were n = 100, 500, 1000 and 5000, respectively. So we have observations (Ti, Ui) from the 
uniform distribution in [0,1] 2 , and for each such pair we get from the corresponding pair {Xi,Yi), drawn 
from F , independently w.r.t. (Tj,{/j), the indicators 

= l{Xi<Ti} and A i2 = I^y^i^}. 



Table 1 

Estimated values of the standard deviations times n 1 / 3 for three estimators of Fo(t,u) at a number of values of (t,u), where 
Fo(t,u) = 1 lu(t — u). The values corresponding to oo are the asymptotic values, deduced from Theorems 2.1 and 3.1. The 

asymptotic values for the MLE are unknown. 
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0.130 


0.107 


0.4 


100 


0.379 


0.212 


0.198 




500 


0.367 


0.194 


0.176 




1000 


0.360 


0.190 


0.179 




5000 


0.357 


0.178 


0.156 




oo 




0.182 


0.162 


0.6 


100 


0.475 


0.263 


0.226 




500 


0.412 


0.223 


0.221 




1000 


0.450 


0.238 


0.215 




5000 


0.441 


0.218 


0.205 




oo 




0.203 


0.206 


0.8 


100 


0.550 


0.276 


0.290 




500 


0.508 


0.253 


0.265 




1000 


0.503 


0.255 


0.266 




5000 


0.514 


0.236 


0.240 




oo 




0.183 


0.236 







u = 


: 0.6 




t 


n 


MLE 


MSLE 


Plug-in 


0.2 


100 


0.003 


0.043 


0.205 




500 


-0.001 


0.037 


0.263 




1000 


-0.037 


0.029 


0.264 




5000 


-0.0002 


0.030 


0.230 




oo 




0.044 


0.133 


0.4 


100 


-0.004 


0.061 


0.155 




500 


-0.006 


0.056 


0.169 




1000 


-0.022 


0.048 


0.164 




5000 


-0.006 


0.049 


0.167 




oo 




0.056 


0.166 


0.6 


100 


0.115 


0.099 


0.015 




500 


0.003 


0.084 


0.202 




1000 


-0.036 


0.080 


0.190 




5000 


0.007 


0.072 


0.188 




oo 




0.067 


0.200 


0.8 


100 


0.118 


0.124 


-0.152 




500 


-0.018 


0.112 


-0.078 




1000 


0.006 


0.117 


-0.090 




5000 


-0.0002 


0.116 


-0.006 




oo 




0.078 


0.233 



(a) n 1 / 3 times standard deviation 



(b) n 1//3 times bias 
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Since simulations for sample size 5000 with the 'real' MLE, based on the maximal intersection rectangles, 
would have taken prohibitively long, we instead computed the MLE on a sieve, constructed in the following 
way. For each sample of size n, we distributed m n = [n 2 / 3 ] points on the unit square, by letting their x- and 
y-coordinates by multiples of n -1 / 3 and permuting these coordinates randomly, according to the uniform 
distribution on permutations. Here [x] denotes the largest integer < x. 

So we start with order n 2 / 3 points on which the sieved MLE can place its mass, to which we add the 
vertices of the unit square to ensure a finite log likelihood, and compute the MLE which only is allowed to 
have mass at these points. This set of points is shown in Figure 3, corresponding to sample size n = 1000, 
and the reduced set of points on which the MLE actually puts positive mass is also shown in this picture. 




0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 



(a) Initial set of points of possible mass (b) Mass points of MLE 

Fig 3: A set of points on which the MLE is allowed to put its mass and the actual set of mass points of the 
MLE for a sample of size n = 1000. 

The rather different set of points on which the plug-in estimate puts its mass is shown in Figure 4. In 
fact, inspection of the set of points on which the MLE, based on the maximal intersection rectangles puts 
its mass, shows that that such sets are rather similar to Figure 3b and not similar to Figure 4. By the rather 
irregular structure of sets like Figure 3b the bias of the MLE is reduced w.r.t. the plug-in estimator which 
is constant on squares with sides of order n" 1 / 3 . We note, however, that the number of points in Figure 3a 
is the same as in Figure 4 (the number is 121). 

We took the bandwidth h n for the SMLE equal to n -1 / 6 and we also used this as binwidth for the plug- in 
estimator. Table la shows that there is no indication that n 1 / 3 times the standard deviation of the MLE is 
increasing with sample size, and Table lb suggests that the bias times n 1 / 3 is vanishing (as is also true in the 
one-dimensional case!), asn-> oo, so the hypothesis that the rate of convergence of the MLE is n 1 / 3 is not 
contradicted. The theory for the MSLE and plug-in estimators seems to be confirmed by the simulations, in 
particular at the points (0.4,0.6) and (0.6,0,6), where the boundary effects are still not active. Note that 
if, for example, n = 500, the bandwidths for the SMLE are equal to 500 -1 / 6 « 0.3549537, so the boundary 
kernel starts getting active for the points (0.2, 0.6) and (0.8, 0.6). This is still true for sample size n = 5000, 
where the bandwidth is 5000" 1/6 « 0.2418271. It is clear from Table la that the MLE has a bi ggcr variance 
than the other two estimators, but on the other hand the bias of the MLE is usually smaller than that of 
the other estimators. 

If the second order partial derivatives of the distribution function Fq vanish at (i,it), as happens, for 
example, with the uniform distribution, it is possible to achieve higher rates of convergence with the SMLE 
and the plug-in estimator. We demonstrate this for the uniform distribution function Fo, where we keep the 
bandwidth constant and equal to 0.4 for the SMLE and equal to 0.2 for the plug-in estimator (to avoid the 



P. Groeneboom/ Bivariate current status 



11 




Fig 4: The set of points on which the plug-in estimator puts its mass for n = 1000. 



Table 2 

Estimated values of the standard deviations times n 1 ' 3 for three estimators of Fo(t,u) at a number of values of (t,u), where 
Fo(t,u) = tu. The values corresponding to oo are the asymptotic values, deduced from Theorems 2.1 and 3.1. The asymptotic 

values for the MLE are unknown. 







u 


= 0.6 




t 


n 


MLE 


MSLE 


Plug-in 


0.2 


100 


0.371 


0.199 


0.388 




500 


0.367 


0.173 


0.286 




1000 


0.363 


0.145 


0.262 




5000 


0.360 


0.121 


0.195 




oo 










0.4 


100 


0.458 


0.266 


0.511 




500 


0.438 


0.220 


0.391 




1000 


0.447 


0.197 


0.332 




5000 


0.438 


0.159 


0.257 




oo 










0.6 


100 


0.480 


0.294 


0.576 




500 


0.486 


0.239 


0.433 




1000 


0.491 


0.221 


0.362 




5000 


0.470 


0.178 


0.301 




oo 










0.8 


100 


0.517 


0.303 


0.602 




500 


0.497 


0.242 


0.438 




1000 


0.490 


0.226 


0.386 




5000 


0.479 


0.177 


0.303 




oo 















u 


= 0.6 




t 


n 


MLE 


MSLE 


Plug-in 


0.2 


100 


-0.032 


-0.009 


-0.007 




500 


0.001 


0.037 


0.005 




1000 


-0.027 


-0.006 


-0.008 




5000 


0.006 


0.024 


-0.009 




oo 










0.4 


100 


-0.016 


-0.009 


-0.028 




500 


0.018 


0.007 


0.004 




1000 


-0.022 


-0.001 


0.001 




5000 


-0.008 


-0.003 


0.004 




oo 










0.6 


100 


0.106 


0.023 


-0.025 




500 


0.011 


0.022 


0.014 




1000 


-0.004 


0.005 


-0.008 




5000 


0.027 


0.010 


0.004 




oo 










0.8 


100 


0.092 


0.033 


-0.011 




500 


-0.015 


0.037 


-0.001 




1000 


0.012 


0.012 


0.008 




5000 


0.014 


-0.011 


0.006 




oo 











(a) n 1//3 times standard deviation 



(b) n 1 / 3 times bias 



boundary correction). In this case the bias times n 1 / 3 tends to zero for the SMLE and the plug- in estimator, 
see Table 2b. If we keep the bandwidth constant, the variance of the SMLE and plug-in estimator should be 
of order n _1 and these estimators should therefore actually attain a parametric rate of convergence in this 
case. We expect the MLE to have again rate n 1 / 3 in this case however, which is also (to a certain extent) 
suggested by Table 2. 
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5. More general bivariate interval censoring 

The purpose of this section is to show that the MLE and SMLE, discussed in the preceding sections in the 
context of the bivariate current status model, can also be used for more general interval censored data. As 
mentioned earlier, the current status model is the simplest case of the interval censoring model. For the 
bivariate interval censoring, case 2, model, the data are of the form 

(Tu, U a ,T i2 , U i2 , A£\ Ag\ A£\ Ag>) , i = 1, . . . ,n, 

where 

=Mx il <T il }, ^tt =l {T i i<X il <U il }- l = 1 {X i2 <T i2 }, = l{T s2 <X l2 <[/ l2 }, 

Defining 

A (3) _ 1 _ A (l) _ A (2) A (3) _ 1 _ A (l) _ A (2) 

and defining the corresponding generic values 5^ similarly, we can define the measure dVn^ by 

W> = tf[> 5? 8$ ^ d¥ n {t,u,v,w,S^\ 5V,6?\6?) , 1 < i,j < 3, 
and an MLE of F is then obtained by maximizing 

£(F) = f log F(t, v) dV^ + f log {F{u, w) - F(t, w) - F(u, v) + F(t, v)} dV^ 22) 

J J t<u, v<w 

+ I log {F(u, v) - F(t, v)} dV^ + f log {F(t, w) - F(t, v)} dV^ 

J u>t J w>v 

+ f log{F 1 (u)-F 1 (t)-F(u,w) + F(t,w)} dV^ 

Ju>t 

+ f \og{F 2 {w)-F 2 (v)-F(u,w) + F(u,v)} dV^ 

J W>V 

+ [ \og{F 1 (t)-F(t,w)}dV^ + [ \og{F 2 (v)-F(u,v)} dV^ 

J w>v J u>t 

+ J log{l - - F 2 {w) + F{u,w)} dV r i 33) (5.1) 

over bivariate distribution functions F. The Fenchel duality conditions become: 

dK (n) , f dV^ 



t>x, v>y F(t, V) + J t <x<u, v<y<w F (u, «0 - F(t, w) - F(u, v) + F(t, v) 

f dK (21) | f dK (12) 

Jt<x<u, v> y F{u, v) - F(t, v) J t > Xt v<y < w F(t, w) - F(t, v) 



+ 



J 

J t<X<U 



dV^ 



t<x<u, w > y Fi («) - Fi (t) - F(u, w) + F(t, w) 



J 



dV^ 



' ' v<v < w F 2 (w) - F 2 (v) - F(u,w) + F(u,v) 

dK (13) r dv^ 



+ 



t<x<u, v>y Jt>x, v<y<w F 2 (v) — F(u, v) 

(33) 



' u<x, w<y 

with equality if (x, y) is a point of mass of dF. 



f dVn 

+ Ju<x, w < v 1 - F^u) - F 2 (w) + F(u, w) - h (5 ' 2) 
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For computational purposes (but probably not for the development of distribution theory!) it is more 

(ij) 

convenient not to distinguish between the measures Vn and just to introduce rectangles to which the 
unobservable observations are known to belong, where we represent the (one-sided) unbounded rectangles 
by finite rectangles with upper or lower bounds outside the range of the observed data. In this set-up we 
simply have to maximize 

X>lGg#<P, ( 5 - 3 ) 

where p = (px> . . . ,p m )' is a vector of probability masses at possible points of mass (xj,yj) and Hi is a 
vector of length to, consisting of ones and zeros, where the component Hij is equal to 1 if the point (xj, yj) 
is contained in the rectangle 

[Ln, Rn] x [Li2, R12], 

and is zero, otherwise, and where the fa denote the multiplicities at the ith observation point. The masses 
pj should be nonnegative and sum to 1. This optimization can easily be handled by using iterative quadratic 
minimization and the support reduction algorithm, documented in [13]. In fact, the treatment is completely 
analogous to the treatment of the Aspect experiment for quantum statistics, discussed there. 

The data of [1] are given in Table 3, where the rectangles to which the hidden observations are known 
to belong are denoted by [L il; i? j;l ] x [L i2 ,Ri2\, i = l,...,n. The frequencies of the hidden observations 
belonging to these rectangles are given in the 5th and 10th column. There are 87 observation rectangles and 
the total sample size, taking the frequencies into account, is 204. The table is also given in [18], table 7.1, 
p. 165, but there the rectangles are slightly changed from the data in [1] by lowering the left bounds by 1 if 
they are larger than zero. Since we do not see a pressing reason for doing that, we just give the data here as 
they were given by [1] . If the upper bound Rij is unknown, we put Rij = 00 and if the lower bound is 
unknown, we put = —00. 

The maximal intersection rectangles where the MLE will put its mass are given in Table 4a. They can be 
computed, for example, by applying the reduction algorithm, used in the R package MLEcens. 

To facilitate the comparison with the existing literature, we will only discuss the MLE, based on the 
preliminary reduction to rectangles which can have mass, and not follow the procedure we used for computing 
the MLE on a sieve in the simulation from the density f(x, y) — x + y on [0, l] 2 . We will use the convention 
of putting the mass of the MLE in the right upper corner of these rectangles, and compute the MLE by 
the support reduction algorithm of [13]. The result is shown in table 4, where the masses of the MLE are 
given. It is seen that this is in close correspondence with Table 7.2 on p. 166 of [18], apart from the slightly 
different definition of the rectangles. The R package MLEcens also gives this result (in all 9 decimals). 

The SMLE for bivariate interval censoring if again defined by (3.1). A picture of the MLE and the SMLE 
is shown in Figure 5 and the picture of the level curves in Figure 7. For the meaning of the codings CMV 
and MAC, see [1] or [18], section 7.3. It can be seen from this picture that the steep increase of the first 
marginal df of the MLE and SMLE, shown in Figure 6, is due to the 'ridge' for the larger values of the second 
coordinate. The levels of both estimates are shown in Figure 7. It seems to me that the SMLE might be the 
more sensible estimate, also in view of the representational non-uniqueness of the MLE, which is somewhat 
'washed out' by the SMLE. 

6. Concluding remarks 

In the preceding, three estimators for the bivariate current status model were studied: the maximum likeli- 
hood estimator (MLE), which in the simulation study was restricted to the MLE on a sieve, the smoothed 
maximum likelihood estimator (SMLE), obtained by integrating a kernel w.r.t. the masses of the MLE, and a 
purely discrete plug-in estimator. All estimators seem to attain the n 1 / 3 rate, and for the SMLE and plug-in 
estimator the asymptotically normal distributions were specified. 

It might be somewhat surprising that the MLE and SMLE have the same rate, whereas the natural rates 
in the one-dimensional case are n 1 / 3 and n 2 / 5 , respectively, see [10]. But in the bivariate case the variance 
is of order l/(nh 2 ) and the bias of order h 2 , if a bandwidth h is taken in both directions and the kernel is of 
the usual symmetric and positive type. This makes the optimal choice of bandwidth of order n -1 / 6 , leading 
to a rate of order n 1 / 3 . 
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Table 3 
The Betensky- Finkelstein data 





Rn 


Li2 


Ri2 


frequency 




Li\ 


Rii 


L,2 


Ri2 


frequency 





3 





- 


3 




6 


- 


6 


- 


3 





3 


3 


- 


1 




6 


- 


9 


- 


1 





3 


6 


- 


3 




9 


- 





- 


2 





6 


6 


- 


1 




9 


- 


9 


- 


3 





3 


9 


- 


1 




9 


- 


12 


- 


1 





3 


12 


- 


5 




12 


— 





- 


5 





3 


15 


- 


5 




12 


— 


6 


- 


1 





6 


15 


- 


1 




12 


— 


9 


- 


4 


3 


3 


3 


- 


1 




12 


- 


12 


- 


10 


3 


3 


6 


- 


1 




15 


— 





- 


3 


3 


3 


9 


- 


3 




15 


— 


3 


- 


1 


3 


6 


9 


- 


2 




15 


- 


6 


- 


1 


3 


6 


12 


- 


3 




15 


- 


9 


- 


2 


3 


3 


15 


- 


2 




15 


- 


12 


- 


8 


3 


6 


15 


- 


2 




15 


- 


15 


- 


9 


3 


6 


18 


- 


1 




18 


- 





- 


1 


3 


3 


21 


- 


1 




18 


— 


6 


- 


1 


6 


6 





- 


2 




18 


- 


9 


- 


1 


6 


9 





- 


1 




18 


- 


12 


- 


1 


6 


9 


9 


- 


1 




18 


- 


15 


- 


3 


6 


6 


12 


- 


1 




18 


- 


18 


- 


6 


6 


9 


12 


- 


2 




21 


- 


15 


- 


1 


6 


6 


15 


- 


1 




- 








- 


9 


6 


9 


15 


- 


1 




- 





3 


- 


3 


6 


6 


18 


- 


1 




- 





6 


- 


10 


6 


9 


18 


- 


2 




- 





9 


- 


6 


9 


9 





- 


1 




- 





12 


- 


8 


9 


12 





- 


2 




- 





15 


- 


5 


9 


9 


9 


- 


2 




- 





18 


- 


4 


9 


12 


9 


- 


1 




- 





21 


- 


1 


9 


12 


12 


- 


3 







— 





3 


1 


9 


9 


15 


- 


1 




6 


- 





6 


1 


9 


12 


24 


- 


1 




6 


- 


6 


6 


1 


9 


9 


27 


— 


1 




12 


— 





3 


1 


12 


12 







1 




12 







6 


1 


12 


15 







1 




15 







3 


1 


12 


15 


6 




1 




21 




15 


15 


1 


12 


15 


15 




1 




3 









1 


12 


15 


21 




1 




9 









1 












6 




12 









1 


3 









2 







3 





6 


1 


6 









1 




3 


6 


6 


12 


1 


6 




3 




2 




9 


9 


9 


9 


1 












1 















It is possible to attain again the local n 2 / 5 rate in the bivariate case, but then one has to take recourse to 
higher order kernels K with the property 

J u 2 K(u) du = 0. 

One also has to take bandwidths of order n -1 / 10 instead of order n" 1 ^ in this case, which makes the 
judicious choice of boundary kernels even more important. Moreover, one has to strengthen the conditions of 
the theorems to the existence of 4th derivatives. However, if one is willing to do that, it is easy to strengthen 
Theorem 3.1 by letting the kernel IK be based on, for example, the kernel 

K x {u) - if (1 - u 2 f [llu 2 - 3)l[_ M] (u) 

which is the fourth order kernel corresponding to the Triweight kernel 

K(u) = §(l~u 2 ) 3 l hlA] (u). 
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Table 4 

Maximal intersection rectangles and masses of MLE 





Rjx 


L j2 


Rj2 




Lji 


L j2 


mass MLE 






















0.013676984 








21 


- 







21 


0.307533525 


3 


3 


21 


— 




3 


21 


0.087051863 


6 


6 


6 


6 




6 


6 


0.014940282 


6 


6 


18 






6 


18 


0.062521573 


9 


9 


9 


9 




9 


9 


0.010009349 


9 


9 


27 






9 


27 


0.071073995 


12 


12 










12 





0.004836043 


12 


12 


24 






12 


24 


0.053334241 


15 


15 










15 





0.042456241 


15 


15 


21 






15 


21 


0.021573343 


21 




15 


15 




21 


15 


0.044427509 


21 




18 






21 


18 


0.266565054 



(a) Canonical rectangles 



(b) Masses of MLE 




10 5 10 15 

CMV 



(a) MLE (b) SMLE 

Fig 5: MLE and SMLE for the Betensky-Finkelstein data, restricted to the interval [0, 18] x [0,24]. 

But one loses the property that the resulting estimate is necessarily a distribution function, since the kernel 
Ki is no longer positive. For the choice of higher order kernels, see, e.g., [14]. 

7. Appendix 

Proof of Theorem 2.1. We use the representation 

j An 5i52dP n (v,w, 61,52) 
J dG n (v,w) 

J An 5i6 2 d¥ n (v,w, 61,62) - E {f An 5i5 2 dF n (v,w,5i,52) | (Ti,Ui), i = 1, ... ,n| 

f An d<& n (v,w) 

E { J An 6162 d¥ n (v, w, 61,62) - F (t, u) I (T u Ui),i = l,..., n} 



J A d& n (v,w) 



(7.1) 
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0.7 1 I 




0.5 - ,-' 

0.4 - I T 7 ' 

0.3 - 

0.2 - / 

0.1 - 

o.o J 

I I I I I 

5 10 15 20 

Fig 6: The first marginal df of the data set in [1], computed on the interval [0, 21) (the largest observation 
point on the first coordinate is 21). The solid curve gives the first marginal df of the MLE and the dashed 
curve the first marginal of the SMLE, taking bandwidth rT 1 ^ . 




5 10 15 5 10 15 



(a) Level plot of the MLE (b) Level plot of the SMLE 

Fig 7: Contourplot of the MLE and SMLE for the Betensky-Finkelstein data, restricted to the interval 
[0,18] x [0,24]. 

The numerator of the first term on the right-hand side can be written: 

n 

n- 1 £ {A n A l2 - F {T U U t )} 1^,(2}, U,). 
»=i 

This is the sum of i.i.d. random variables, and 

var ({A u Aia - F Q {T U U^} l An {T u U x )) ~ 4n-^ a g(t, u)F [t, u) {1 - F (t, u)} , n ->■ to. 
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Hence: 

/ 5 1 5 2 d¥ n (v,w,5 1 ,S 2 ) - E \ J. S 1 S 2 dP n (v,w,S 1 ,S 2 ) \ (T^Ui), i = 1, . . . ,n\ 

n f w r < ^ M . A 

J^^ dh, n (v,w) 

where 

^ 2 = F (t, M ){l-F (t,u)} 
4ff(t,«) 

The numerator of the second term on the right-hand side of (7.1) can be written: 

n 

n ~ X Yl iV*' U i) ~ U )> 1a « ( T< ' U *)' 

i=l 

Hence we get: 

, -SfXi ^i5a <SPn(«, <*i, <5a) - F (t, u) | (T i( Ui),i=l,..., n\ 

n 1 ' 3 ^^ s 1 I {dtF (t,u) + d*F (t,u)} . 

J An dG n {v,w) 

Note that, putting h = h n ~ n. -1 ' 6 , 

- 1 ^ J B{F (r i ,C/ i )- J F (t,u)}U„(T i ,C/ i )= / {-FbM - ^o(*o,uo)} g(t,u)dtdu 

■ 1 «/ max! \t— trAAu— un\\<h 



n 



= diF (t Q ,u ) (t-t ) g(t,u)dtdu 

J max{|£— tolslu— uo\}<.h 

+ d 2 F (t ,u ) (u~u )g(t,u)dtdu 

J max{ \t — to I , \u— uq | }</i 

+ |^Fo(*o,«o) / (t-t ) 2 g(t,u)dtdu 

J max{ \t— to \ ,\u—uo\}<h 

+ ^d 2 F (t ,u ) / (u-u ) 2 g{t,u)dtdu + o[n^ 2,3 \ 

J max{ \t— to \ ,\u—ua\}<h 

= d 1 F (to,u )d 1 g(t ,u ) / (t~t Q ) 2 dtdu 

J max{ \t — to I ,\u— uq I }<h 

+ d 2 F (t , u )d 2 g(t 0l u ) / (u-u Q ) 2 dtdu 

J max{ \t— to \ ,\ u—uo\}<h 

+ ±d 2 F {t ,u ) f (t-t ) 2 g(t,u)dtdu 

J max{|t— tp I , I it— uo I }<h 

+ ld 2 F Q (t ,u ) / (u-u ) 2 g(t,u)dtdu + o(n- 2/3 ) 

J max{|t— to I ,\u— uq \y<h 

= I {<9iF (to, u )d 1 g(t Q , u ) + d 2 F (t , u )d 2 g(t Q ,u )} h 4 

+ § {d 2 F {t , uo) + d 2 F (t , u Q )}g{t a ,u )h 4 + (V 2 / 3 

Moreover, 

var (^n- 1 {M^i, Ui) - F (t, u)} U^uA 



= Oln- 1 {F (t,u) - F {t ,u )} 2 g(t,u)dtdu\ 

V J max{ |£ — to I ,\u— Uq \y<h J 



= O (n- 5 ? 6 
The result now follows. 
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